NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An In-Context Learning Agent for Formal Theorem-Proving

Thakur, Amitayush; Tsoukalas, George; Wen, Yeming; Xin, Jimmy; Chaudhuri, Swarat (October 2024, Conference on Language Models (CoLM 2024))

Full Text Available
PUTNAMBENCH: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Tsoukalas, George; Lee, Jasper; Jennings, John; Xin, Jimmy; Ding, Michelle; Jennings, Michael; Thakur, Amitayush; Chaudhuri, Swarat (December 2024, Neural Information Processing Systems (NeurIPS), 2024)

We present PUTNAMBENCH, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PUTNAMBENCH consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PUTNAMBENCH to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PUTNAMBENCH problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PUTNAMBENCH is available at https://github.com/trishullab/PutnamBench.
more » « less
Full Text Available
PUTNAMBENCH: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Tsoukalas, George; Lee, Jasper; Jennings, John; Xin, Jimmy; Ding, Michelle; Jennings, Michael; Thakur, Amitayush; Chaudhuri, Swarat (December 2024, Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks track)

Full Text Available
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Tsoukalas, George; Lee, Jasper; Jennings, John; Xin, Jimmy; Ding, Michelle; Jennings, Michael; Thakur, Amitayush; Chaudhuri, Swarat (December 2024, Neural Information Processing Systems)

We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.
more » « less
Full Text Available
An In-Context Learning Agent for Formal Theorem-Proving

Thakur, Amitayush; Tsoukalas, George; Wen, Yeming; Xin, Jimmy Xin; Chaudhuri, Swarat (October 2024, Conference on Language Models, 2024)

Full Text Available
Programmatic Imitation Learning From Unlabeled and Noisy Demonstrations

https://doi.org/10.1109/LRA.2024.3385691

Xin, Jimmy; Zheng, Linus; Rahmani, Kia; Wei, Jiayi; Holtz, Jarrett; Dillig, Isil; Biswas, Joydeep (June 2024, IEEE Robotics and Automation Letters)

Full Text Available

Search for: All records